Differential (Predictive) Coding

Motivation

  • Usually happens that the differences between consecutive PCM (Pulse-Code Modulation) [A.V. Oppenheim, 1999] samples

    $$ e[n] = s[n] - s[n-1], $$

    tend to have a smaller variance (and therefore, smaller entropy) than the original ($s$) ones

    $$ \sigma^2(e) \leq \sigma^2(s), $$

    which potentially provides better lossless compression ratios for $e$ than for $s$.

  • Finally, notice that by definition, if a source of "noise" (such as the quantization noise) has not been introduced during the encoding process, differential encoding is a fully reversible process:

    $$ s[n] = e[n] + s[n-1]. $$

Lab

  • Compute $e$ signal for the Jfk_berlin_address_high.ogg $s$ signal and plot the probability density function (also known as the histogram) of $e$. Note: use Python and Matplotlib, and insert the code here.
In [ ]:
import os.path as path

if path.exists('Jfk_berlin_address_high.ogg'):
    print('The file exists!!!')
else:
    !curl -O https://upload.wikimedia.org/wikipedia/commons/3/3a/Jfk_berlin_address_high.ogg
    print('Download successful!!!')
In [ ]:
#Use ffmpeg, you can install it on Anaconda(https://anaconda.org/menpo/ffmpeg)
#Convert OGG to WAV
import os.path as path

if path.exists('Jfk_berlin_address_high.wav'):
    print('The file exists!!!')
else:
    !ffmpeg -i Jfk_berlin_address_high.ogg Jfk_berlin_address_high.wav
    print('Conversion successful!!!')
In [5]:
import matplotlib.pyplot as plt
import pylab as P
from scipy.io import wavfile as wav
import numpy as np
from IPython.display import Math
import IPython.display as ipd

#Read wave
rated, data = wav.read('Jfk_berlin_address_high.wav')
#Keep only 2 min because wave is too long
signal = data[0:int(120*rated)]
#This wave have 2 channels so in this example only use one
soloC = signal[:,0]
#Compute e_signal
e_signal = np.append(soloC[0], soloC[1:] - soloC[:-1])
#Recompute original
recompute = np.append(e_signal[0], e_signal[1:] + soloC[:-1])
#Variance
varSignal = np.var(soloC)
varE = np.var(e_signal)
#Check origianl signal with recompute signal 
c = np.all(np.array(soloC) == np.array(recompute))
            
#Output
display(Math(r'\text{Original signal} s[n]'))
print(soloC)
display(Math(r'\text{For compute the e signal we use the following definition:} e[n]=s[n]-s[n-1]'))
display(Math(r'\text{And we get this:} e[n]'))
print(e_signal)
display(Math(r'\text{Now we check the variance:} \sigma^2(e)\leq\sigma^2(s)'))
print('Variance s',varSignal)
print('Variance e',varE)
if (varE <= varSignal ): display(Math(r'\text{Variance of the signal} e \text{is less than the variance of signal} s'))
display(Math(r'\text{We can recompute the original signal use this definition:} s[n]=e[n]+s[n-1]'))
display(Math(r'\text{Result:} s[n]'))
print('Recompute signal', recompute)
display(Math(r'\text{We can verify the original signal and the recompute signal are equal}'))
if (c == True) : display(Math(r'\text{Recompute signal is equal to original}'))

display(Math(r'\text{Graphics}'))    
%matplotlib inline
plt.figure()
plt.plot(soloC)
plt.title('S[n] time domain')
plt.xlabel('Time/s')
plt.ylabel('Amplitude')
plt.figure()
plt.hist(soloC,50)
plt.title('S[n] histogram-probability density function')
plt.figure()
plt.plot(e_signal)
plt.title('e[n] time domain')
plt.xlabel('Time/s')
plt.ylabel('Amplitude')
plt.figure()
plt.hist(e_signal,50)
plt.title('e[n] histogram-probability density function')
plt.show()

#Audio Demo notebook jupyter we need use notebook 5.3.1 if not you cant heard the audio
#If you have last version of notebook we dont need restrict the time
#Original
display(Math(r'\text{Audio only a demo 24s for a subjective comparison}'))
display(Math(r'\text{Original}')) 
original = soloC[0:int(24*rated)]
ipd.display(ipd.Audio(data=original,rate=rated))
#Recompute
display(Math(r'\text{Recompute Signal}'))
recomp = recompute[0:int(24*rated)]
ipd.display(ipd.Audio(data=recomp,rate=rated))
display(Math(r'\text{E Signal}'))
e = e_signal[0:int(24*rated)]
ipd.display(ipd.Audio(data=e,rate=rated))
$$\text{Original signal} s[n]$$
[   0    0    0 ..., 5632 5295 4750]
$$\text{For compute the e signal we use the following definition:} e[n]=s[n]-s[n-1]$$
$$\text{And we get this:} e[n]$$
[   0    0    0 ..., -117 -337 -545]
$$\text{Now we check the variance:} \sigma^2(e)\leq\sigma^2(s)$$
Variance s 16284125.0537
Variance e 249304.528952
$$\text{Variance of the signal} e \text{is less than the variance of signal} s$$
$$\text{We can recompute the original signal use this definition:} s[n]=e[n]+s[n-1]$$
$$\text{Result:} s[n]$$
Recompute signal [   0    0    0 ..., 5632 5295 4750]
$$\text{We can verify the original signal and the recompute signal are equal}$$
$$\text{Recompute signal is equal to original}$$
$$\text{Graphics}$$
$$\text{Audio only a demo 24s for a subjective comparison}$$
$$\text{Original}$$
$$\text{Recompute Signal}$$
$$\text{E Signal}$$

DPCM (Differential PCM) [Cutler, 1952]

  • Signals sampled at Nyquist rate exhibit correlation between consecutive samples. Therefore, the variance of the first difference

    \begin{equation} \sigma^2(\{s[n+1]-s[n]\}_{n=0}^{N-1}) \end{equation}

    will be smaller than the variance of the signal itself

    \begin{equation} \sigma^2(\{s[n+1]-s[n]\}_{n=0}^{N-1}) < \sigma^2(s) = \sigma_s^2. \end{equation}

  • In general, DPCM consist in computing the residual (error signal):

    \begin{equation} e[n] = s[n] - \hat{s}[n] \end{equation}

    where for the simplest case:

    \begin{equation} \hat{s}[n] = s[n-1] \end{equation}

  • Used in the G.726 standard.

ADPCM (Adaptive DPCM)

  • In orter to minimize $\sigma^2_e$, the predictor $P$ can change its behaviour depending on the characteristics of $s$.

Forward adaptive prediction

  • Splits the input into blocks, optimizes the predictor for each block and sends with each ADPCM block the predictor coefficients as side information.

Optimal prediction based on Weiner-Hopf optimization

  • The predictor can be any structure capable of producing a signal

    \begin{equation} \hat{s} \approx s. \end{equation}

  • In the case of using LPC (Linear Predictive Coding), where

    \begin{equation} \hat{s}[n] = \sum_{i=1}^M a_i s[n-i], \end{equation}

    $\{a_i\}_{i=1}^M$ are the LPC coefficients and $M$ is the predictor order.

  • Optimal foeffs $\{a_i\}$ can be found when they minimize the variance of the residue (error signal)

    \begin{equation} \sigma_e^2 = \text{E}\big[(s[n]-\sum_{i=1}^M a_i s[n-i])^2\big]. \end{equation}

  • For this, it must me hold that

    \begin{equation} \left.

    \begin{array}{c}
      \displaystyle\frac{\partial\sigma_e^2}{\partial a_1} = -2\text{E}\Big[\big(s[n]-\displaystyle\sum_{i=1}^M a_i s[n-i]\big)s[n-1]\Big] = 0 \\
      \displaystyle\frac{\partial\sigma_e^2}{\partial a_2} = -2\text{E}\Big[\big(s[n]-\displaystyle\sum_{i=1}^M a_i s[n-i]\big)s[n-2\Big] = 0 \\
      \vdots \\
      \displaystyle\frac{\partial\sigma_e^2}{\partial a_M} = -2\text{E}\Big[\big(s[n]-\displaystyle\sum_{i=1}^M a_i s[n-i]\big)s[n-M]\Big] = 0 
    \end{array}
    

    \right\rbrace. \end{equation}

  • Appying expectations we get that

    \begin{equation} \left.

    \begin{array}{c}
      \displaystyle\sum_{i=1}^M a_i \text{R}_{ss}(i-1) = \text{R}_{ss}(1) \\
      \displaystyle\sum_{i=1}^M a_i \text{R}_{ss}(i-2) = \text{R}_{ss}(2) \\
      \vdots\\
      \displaystyle\sum_{i=1}^M a_i \text{R}_{ss}(i-M) = \text{R}_{ss}(M) \\
    \end{array}
    

    \right\rbrace, \tag{Weiner-Hopf equations} \label{Weiner-Hopf equations} \end{equation}

    where $\text{R}_{ss}(k)$ is the autorrelation function of $s$, defined as

    \begin{equation} \text{R}_{ss}(k) = \text{E}\big[s[n]s[n-k]\big] = \frac{1}{M-k}\sum_{i=1}^{M-k} s[i]s[i+k]. \tag{autocorrelation} \label{autocorrelation} \end{equation}

Lab

For each $M=\{1,2,3\}$, determine the optimal predictor for the audio Jfk_berlin_address_high.ogg, using Weiner-Hopf optimization. Next, compute the variance of $e$ for each prediction order.

Backward adaptive prediction

  • Both, the encoder and the decoder can constinuosly adapt the prediction to the characteristics of $s$, if both only use the information that is available at the decoder.

Adaptive prediction based on Least Mean Squared (LMS) algorithm

  • As we know,

    \begin{equation} e[n] = s[n] - \hat{s}[n] = s[n] - \sum_{i=1}^M a_is[n-i]. \end{equation}

  • Our objective is to minimize $e[n]$, which is the same that minimizing the energy of the prediction error $e^2$, by controlling the $\{a_i\}_{i=1}^M$ coefficients.

  • Suppose that we can control iteratively these coefficients, by incrementing or decrementing them with a proportionality constant $\alpha$ (the larger $\alpha$, the faster the convergence but possiblely the oscillation, and viceversa). Let's denote $a_i^{[n]}$ the value of coeff $a_i$ at iteration $n$ of the algorithm. If happens, for example, that

    \begin{equation} \frac{\partial e^2[n]}{\partial a_i} < 0 \end{equation}

    (the squared prediction error for iteration $n$ has been increased compared to the previous iteration), then the prediction $\hat{s}[n]$ should have been larger, and viceversa. Let's define the iterative control of $\{a_i\}_{i=1}^M$ coefficients as

    \begin{equation} \Big\{a_i^{[n+1]} = a_i^{[n]} - \alpha\frac{\partial e^2[n]}{\partial a_i}\Big\}_{i=1}^M \tag{LMS_idea} \end{equation}

  • Applying derivatives we obtain for $a_i$ that

    \begin{equation} \frac{\partial e^2[n]}{\partial a_i} = \frac{\partial\Big(s[n] - \sum_{i=1}^M a_is[n-i]\Big)^2}{\partial a_i} = -2\Big(s[n] - \sum_{i=1}^M a_is[n-i]\Big)s[n-i] = -2e[n]s[n-i]. \end{equation}

  • Substituting this expression in Eq. (LMS_idea), we get that

    \begin{equation} \Big\{a_i^{[n+1]} = a_i^{[n]} +2\alpha e[n]s[n-i]\Big\}_{i=1}^M. \tag{LMS_control} \end{equation}

  • Notice that, in a backward adaptive prediction scheme, both elements $e[n]$ and $s[n-i]$ are known at iteration $n$ at both, the encoder and the decoder.

Lossy DPCM [Cutler, 1952]

  • The prediction error is quantized to decrease the variance

    \begin{equation} \tilde{e}[n] = Q(e[n]), \end{equation}

    where

    \begin{equation} e[n] = s[n] - \hat{\tilde{s}}[n], \end{equation}

    where

    \begin{equation} \tilde{s}[n] = \hat{\tilde{s}}[n] + \tilde{e}[n]. \end{equation}

  • Used in hybrid (lossy DPCM/transform) video coding.